This document describes fitting a hierarchical ordination to data, including code and the nasty details that are needed.
The data is included in the ‘mvabund’ ‘R’-package, but was originally collected by Gibb et al. It is the observed abundances of 41 ant species at 30 sites in Australia, along with 7 environmental variables, and 5 traits. Our question is how do the environmental variables and traits affect the community composition, including whether they interact.
The abundances, traits and envirinment are each stored in a different matrix. First we load the data and set some constants:
library(nimble)
library(nimbleHMC)
library(mvabund)
data("antTraits")
Y <- antTraits$abund # abundance data of sites by species
X <- scale(antTraits$env) # environmental variables of sites by predictors
TR <- scale(model.matrix(~0+.,antTraits$traits)) # species traits
NSites <- nrow(Y) # number of sites
NSpecies <- ncol(Y) # number of species
NTraits <- ncol(TR) # number of traits
NEnv <- ncol(X) # number of environmental predictors
# Create data lists for Nimble
dat <- list(Y = Y, X = X, TR = TR)
consts <- list(NSites = NSites, NEnv = NEnv, NTraits = NTraits, NSpecies = NSpecies)
We assume that the counts follow a Poisson distribution with a log link function (as we would do in a standard GLM). We assume a species effect (\(\beta_{0j}\) for species \(j\)), and the rest is the hierarchical ordination, so the model for the log of the mean abundance (\(\eta_{ij}\)) is
\[ \eta_{ij} = \beta_{0j} + \boldsymbol{z}_i^\top \boldsymbol{\Sigma} \boldsymbol{\gamma}_j. \] The idea here is that \(\boldsymbol{z}\) and \(\boldsymbol{\gamma}_j\) are the site scores and species loading, equivalent to a typical ordination. But here we assume that they each have a variance of 1, so that \(\boldsymbol{\Sigma}\) is the variance of the latent variables: it will typically be a diagonal matrix, and if any of the terms on the diagonal are close to 0, this suggests that latent variable has (almost) no effect.
This is the same as a classical GLLVM, but here we go further by modelling both \(\boldsymbol{z}\) and \(\boldsymbol{\gamma}_j\). As a simplification, we can think about this as simply a regression of \(\boldsymbol{z}\) (the site effects) against the environmental variables and \(\boldsymbol{\gamma}_j\) against the traits. In reality, it is more complicated, because \(\boldsymbol{z}\) and \(\boldsymbol{\gamma}_j\) are estiamted in the model, so their uncertainty needs to be included.
We denote the abundance at site \(i = 1 \dots n\) for species \(j = 1 \dots p\) as \(Y_{ij}\), or as a matrix as \(\boldsymbol{Y}\). The environmental predictors are \(X_{ik}\) for the \(k = 1 \ldots K\) predictors (or \(\boldsymbol{X}\) as a matrix), and \(t = 1\ldots T\) traits as \(\boldsymbol{R}\). We then assume
\[Y_{ij} \sim Pois(\lambda_{ij})\],
with
\[log(\lambda_{ij}) = \eta_{ij}\].
So that \(\eta_{ij}\) is the linear predictor, which we will model with the hierarchical ordination:
\[ \eta_{ij} = \beta_{0j} + \boldsymbol{z}_i^\top \boldsymbol{\Sigma} \boldsymbol{\gamma}_j \]
We then model \(\boldsymbol{z}_i\) (as in van der Veen et al (2023)) and \(\boldsymbol{\gamma}_j\) hierarchically:
\[ \boldsymbol{z}_i = \boldsymbol{B}^\top\boldsymbol{x}_i + \boldsymbol{\epsilon}_i \] and
\[ \boldsymbol{\gamma}_j = \boldsymbol{\omega}^\top\boldsymbol{R}_{j} + \boldsymbol{\varepsilon}_j \] where
We fit the model with the \(\texttt{Nimble}\) package. We start with a single dimension for simplicity, so that we can show the steps needed.
This is the code for the model with one dimension:
OneLV.nim <- nimbleCode({
for (i in 1:NSites) {
for (j in 1:NSpecies) {
# These three lines specify the model:
Y[i, j] ~ dpois(lambda[i, j]) # Likelihood: Y follows a Poisson distribution
log(lambda[i, j]) <- eta[i, j] # Specify the log link function
eta[i,j] <- beta0[j] + gammas[j]*LVsd*zs[i] # linear predictor: species + LV
}
# Site scores: regression against environmental effects
# the Bs are the coefficients of the environmental effects
epsilon[i] ~ dnorm(0, sd = Sitesd) # Residual site effect
XB[i] <- inprod(X[i, 1:NEnv],B[1:NEnv])
z[i] <- XB[i] + epsilon[i]
}
for(j in 1:NSpecies) {
# Species effects: regression against trait effects
# The Os are the trait effects.
omegaTR[j] <- inprod(TR[j, 1:NTraits],O[1:NTraits])
varepsilon[j] ~ dnorm(0, sd = Speciesd) # Residual
gamma[j] <- omegaTR[j] + varepsilon[j]
beta0[j] ~ dnorm(0, sd = 1) # S
}
# Here we standardise z and gamma, so the both have a variance of 1.
# The standard deviation of the latent variable is LVsd.
zmu <- mean(z[1:NSites])
zs[1:NSites] <- (z[1:NSites]-zmu)/sqrt(mean((z[1:NSites] - zmu)^2)) #scale z to unit sd and center
gammamu<- mean(gamma[1:NSpecies])
gammas[1:NSpecies] <- (gamma[1:NSpecies]-gammamu)/sqrt(mean((gamma[1:NSpecies] - gammamu)^2)) #scale gamma to unit sd and center
# priors for scales
Sitesd ~ dexp(1)
Speciesd ~ dexp(1)
LVsd ~ dexp(1)
# Sign constraint
for(k in 1:NEnv){
B[k] ~ dnorm(0, sd = 1)
}
for(tr in 1:NTraits){
O[tr] ~ dnorm(0, sd = 1)
}
})
These models are notorious for being unidentifiable, i.e. you can get the same mean abundances from different combinations of the parameters. We therefore have to make some adjustments to the model: some of this is done in the model fitting, but for others it is easier to do it afterwards.
inits<-function(consts){
B = rnorm(consts$NEnv)
O = rnorm(consts$NTraits)
varepsilon = rnorm(consts$NSpecies)
epsilon = rnorm(consts$NSites)
list(
B = B,
O = O,
epsilon = epsilon,
varepsilon = varepsilon,
Sitesd = rexp(1),
Speciesd = rexp(1),
beta0 = rnorm(consts$NSpecies),
LVsd = rexp(1)
)
}
Because we are going to parallelise the model fitting, we put the code to do this in a function:
# Function to run one chain: it can be done with HMC or other MCMC algorithms.
RunOneChain <- function(seed, dat, code, inits, consts, Nburn=5e3, NIter=5.5e4, Nthin=10, HMC = FALSE, ...) {
require(nimble)
require(nimbleHMC)
Mons <- c("beta0","Speciesd","Sitesd","LVsd","B","O","epsilon","varepsilon","beta0")
if(HMC) { # HMC everything
res <- nimbleHMC(code = code, constants = consts,
inits = inits(consts), data = dat, monitors = Mons,
niter=NIter, nburnin = Nburn, thin=Nthin, nchains = 1,
samplesAsCodaMCMC = TRUE)
} else {
mod <- nimbleModel(code = code, name = "HO", constants = consts, inits = inits(consts),
data = dat)
model <- compileNimble(mod)
conf <- configureMCMC(model, monitors = Mons, print = FALSE)
# Change the samplers to block update, using a nifty extension of the slice sampler
conf$removeSamplers(c('epsilon', 'varepsilon'))
conf$addSampler(target = 'epsilon', type = "AF_slice")
conf$addSampler(target = 'varepsilon', type = "AF_slice")
mcmc <- buildMCMC(conf)
cmcmc <- compileNimble(mcmc, project = model)
res <- runMCMC(cmcmc, niter=NIter, nburnin = Nburn, thin=Nthin,
nchains = 1, samplesAsCodaMCMC = TRUE, ...)
}
return(res)
}
# Function to run parallel chains
ParaNimble <- function(NChains, ...) {
require(parallel)
nimble_cluster <- makeCluster(4)
samples <- parLapply(cl = nimble_cluster, X = 1:4,...)
stopCluster(nimble_cluster)
# Name the chains in the list
chains <- setNames(samples,paste0("chain", 1:length(samples)))
chains
}
Now we can run the MCMC.
chains.unsgn <- ParaNimble(4, fun = RunOneChain,
dat = dat,
code = OneLV.nim,
inits = inits, HMC = TRUE,
# Nburn=5e1, NIter=5.5e2, Nthin=1, # for a small run
Nburn=1e3, NIter=2e3, Nthin=1, # for a big HMC run
consts = consts)
## Loading required package: parallel
There are a lot of parameters, so a lot of results to look at. Here, we present the traceplots of \(\boldsymbol{B}\), \(\boldsymbol{\omega}\), \(\boldsymbol{\sigma}\) and \(\boldsymbol{\delta}\), and a latent variable-plot.
# Function to apply sign constraints to one set of parameters
# iter is one iteration of chain x that we need to correct for sign
ProcessOneDraw <- function(iter, Consts){
if(is.null(Consts$nLVs)) Consts$nLVs <- 1
# we start by separating into the different components, B, O, epsilon, varepsilon
bs <- iter[grep("^B", names(iter))]
os <- iter[grep("^O", names(iter))]
varepsilon <- matrix(iter[grep("^varepsilon", names(iter))],
ncol=Consts$nLVs, nrow=Consts$NSpecies)
# epsilons, not to confuse with the varepsilons
epsilon <- matrix(iter[grep("^epsilon", names(iter))],
ncol=Consts$nLVs, nrow=Consts$NSites)
# to matrix
B <- matrix(bs,ncol=Consts$nLVs,nrow=Consts$NEnv)
O <- matrix(os,ncol=Consts$nLVs,nrow=Consts$NTraits)
# get sign of main diagonalentries for B
signs <- sign(diag(B))
if(all(signs>0)){ #if these are all positive, no sign swapping is needed, so we're done
return(iter)
}else{
sign.mat = diag(signs, ncol= Consts$nLVs,nrow=Consts$nLVs)
# otherwise, we need to swap!
B = B%*%sign.mat
O = O %*% sign.mat
epsilon = epsilon %*% sign.mat
varepsilon = varepsilon %*% sign.mat
# now put the sign-swapped coefs back
iter[grep("^B", names(iter))] <- c(B)
iter[grep("^O", names(iter))] <- c(O)
iter[grep("^epsilon", names(iter))] <- c(epsilon)
iter[grep("^varepsilon", names(iter))] <- c(varepsilon)
#aaand to the next iteration
return(iter)
}
}
# Function to process a chain, looking over the iterations, to ensure identifiability
postProcess <- function(ch, konsts, progress=FALSE){
require(coda)
# chains should be a list of length nchains
total <- length(ch)
if(progress) pb <- txtProgressBar(min = 0, max = total, style = 3)
i = 0
lst <- lapply(ch, function(x, Cnsts){
res <- t(apply(as.matrix(x),1,ProcessOneDraw, Consts = Cnsts))
i <<- i+1
# update progress bar
if(progress) setTxtProgressBar(pb, i)
# restore parameter names
colnames(res) <- colnames(x)
return(as.mcmc(res))
}, Cnsts = konsts)
as.mcmc.list(lst)
}
Nowe we can look at some lovely chains, to see that they have converged. First the environmental effects
# post-process chains for sign-swapping
chains <- postProcess(chains.unsgn, konsts = consts)
summ.HO = summary(chains)
library(basicMCMCplots)
par(mfrow=c(1,1))
chainsPlot(chains, var = c("B"), legend = F,traceplot=T)
Then the trait effects
chainsPlot(chains, var = c("O"), legend = F,traceplot=TRUE)
These look quite OK (after post-processing). Now we can create a plot of the LV-scores against their indices to inspect the results:
ExtractMeans <- function(summ, name, d=1) {
if(is.null(d)) d <- 1
var <- summ$statistics[grep(paste0('^', name), rownames(summ$statistics)),"Mean"]
matrix(var,ncol=d)
}
# First a plot of the site scores
eps.m <- ExtractMeans(summ.HO, name="epsilon", d=consts$nLVs)
B.m <- ExtractMeans(summ.HO, name="B", d=consts$nLVs)
LVs <- X%*%B.m+eps.m
# Now the species loadings
vareps.m <- ExtractMeans(summ.HO, name="varepsilon", d=consts$nLVs)
O.m <- ExtractMeans(summ.HO, name="O", d=consts$nLVs)
gamma <- TR%*%O.m+vareps.m
par(mfrow=c(1,2))
plot(LVs,type="n", xlab="Sites", ylab="Latent variable 1", main = "Sites")
text(LVs,labels=1:consts$NSites)
plot(gamma,type="n", xlab="Species", ylab="Latent variable 1", main = "Species")
text(gamma,labels=vegan::make.cepnames(colnames(Y)))
More than one dimension requires adding extra identifiability constraints. Now we have two or more latent variables, we also have to worry about rotating them.
There are various ways to place the constraints, some more numerically stable than others. Generally, we note that the model on the link scale is similar to existing matrix decompositions, so that much on the required constraints can be learned from those.
To prevent rotational invariance, we require \(\boldsymbol{B}\) and \(\boldsymbol{\omega}\) to have zeros above the main diagonal. van der Veen et al (2023) instead assumed \(\boldsymbol{B}\) to be a semi-orthogonal matrix as they encountered numerical instability with the constraint that we impose here. However, their constraint would require sampling on constrained parameter spaces, which is difficult, while this constraint formulation allows to use standard out-of-the-box Markov Chain Monte carlo samplers (also, it seems to work fine here). Note, that when $d p,t $ the model is again rotationally invariant, and additional constraints (e.g., order constraints for the latent variables as in a varimax rotation) will need to be added.The new model code is almost the same as before:
# Update our constants from before with the new number of LVs, rest remains the same
consts$nLVs <- nLVs <- 2
# Model code
HO.nim <- nimbleCode({
for (i in 1:NSites) {
for (j in 1:NSpecies) {
eta[i,j] <- beta0[j] + sum(gammas[j,1:nLVs]*LVsd[1:nLVs]*zs[i,1:nLVs])
log(lambda[i, j]) <- eta[i, j]
Y[i, j] ~ dpois(lambda[i, j])
}
for (q in 1:nLVs) {
XB[i, q] <- sum(X[i, 1:NEnv]*B[1:NEnv, q])
epsilon[i,q] ~ dnorm(0,Sitesd[q])#Residual
z[i,q] <- XB[i,q] + epsilon[i,q]
}
}
for(j in 1:NSpecies) {
for (q in 1:nLVs) {
omegaTR[j, q] <- sum(TR[j, 1:NTraits]*O[1:NTraits, q])
varepsilon[j,q] ~ dnorm(0,Speciesd[q]) # Residual
gamma[j,q] <- omegaTR[j,q] + varepsilon[j,q]
}
beta0[j] ~ dnorm(0, sd=1)
}
# Constraints to 0 on upper diagonal
# stole some code from Boral for this - thanks Francis
for(i in 1:(nLVs-1)) {
for(j in (i+1):(nLVs)) {
B[i,j] <- 0
O[i,j] <- 0
}
}
for(i in 1:nLVs) {
# diagonal elements
B[i,i] ~ dnorm(0,1)
O[i,i] ~ dnorm(0,1)#T(dnorm(0,1),0,Inf)
## standardizing z and gamma
zmu[i] <- mean(z[1:NSites,i])
zs[1:NSites,i] <- (z[1:NSites,i]-zmu[i])/sqrt(mean((z[1:NSites,i] - zmu[i])^2)) #scale z to unit sd and center
gammamu[i] <- mean(gamma[1:NSpecies,i])
gammas[1:NSpecies,i] <- (gamma[1:NSpecies,i]-gammamu[i])/sqrt(mean((gamma[1:NSpecies,i] - gammamu[i])^2)) #scale gamma to unit sd and center
# priors for scales
Sitesd[i] ~ dexp(1)
Speciesd[i] ~ dexp(1)
LVsd[i] ~ dexp(.1)
}
## Free lower diagonals
for(i in 2:nLVs) {
for(j in 1:(i-1)) {
B[i,j] ~ dnorm(0,1)
O[i,j] ~ dnorm(0,1)
}
}
for(i in (nLVs+1):NEnv) { for(j in 1:(nLVs)) { B[i,j] ~dnorm(0,1) } } ## All other elements
for(i in (nLVs+1):NTraits) { for(j in 1:(nLVs)) { O[i,j] ~dnorm(0,1) } } ## All other elements
})
inits<-function(consts){
B = matrix(rnorm(consts$nLVs*consts$NEnv),ncol=consts$nLVs)
B[upper.tri(B)] = 0
O = matrix(rnorm(consts$nLVs*consts$NTraits),nrow=consts$NTraits)
O[upper.tri(O)] = 0
varepsilon = mvtnorm::rmvnorm(consts$NSpecies,rep(0,consts$nLVs),diag(consts$nLVs))
epsilon = mvtnorm::rmvnorm(consts$NSites,rep(0,consts$nLVs),diag(consts$nLVs))
list(
B = B,
O = O,
epsilon = epsilon,
varepsilon = varepsilon,
Sitesd = rexp(consts$nLVs),
Speciesd = rexp(consts$nLVs),
beta0 = rnorm(consts$NSpecies),
LVsd = rexp(consts$nLVs)
)
}
And run it:
HO.2LV <- ParaNimble(4, fun = RunOneChain,
dat = dat,
code = HO.nim,
inits = inits, HMC=TRUE,
# Nburn=5e1, NIter=5.5e2, Nthin=1, # for a small run
# niter=1e6, nburnin = 5e4, thin=1e2,nchains = 1,
Nburn=1e3, NIter=4e3, Nthin=1, # for a small run
consts = consts)
Again, check they work. Environment first:
# post-process chains for sign-swapping
chains2LV <- postProcess(HO.2LV, konsts = consts)
summ.HO2LV = summary(chains2LV)
chains2LV <- postProcess(HO.2LV, konsts = consts)
par(mfrow=c(1,1))
chainsPlot(chains2LV, var = c("B"), legend = F, traceplot=TRUE)
And the trait effects:
chainsPlot(chains2LV, var = c("O"), legend = F, traceplot=TRUE)
The traceplots look quite OK. Now, we can make two-dimensional ordination plots of sites and species, with their predictor effects.
AddArrows <- function(coords, marg= par("usr"), col=2) {
origin <- c(mean(marg[1:2]), mean(marg[3:4]))
Xlength <- sum(abs(marg[1:2]))/2
Ylength <- sum(abs(marg[3:4]))/2
ends <- coords / max(abs(coords)) * min(Xlength, Ylength) * .8
arrows(
x0 = origin[1],
y0 = origin[2],
x1 = ends[,
1] + origin[1],
y1 = ends[, 2] + origin[2],
col = col,
length = 0.1)
text(
x = origin[1] + ends[, 1] * 1.1,
y = origin[2] + ends[, 2] * 1.1,
labels = rownames(coords),
col = col)
}
# First a plot of the site scores
eps.m <- ExtractMeans(summ.HO2LV, name="epsilon", d=consts$nLVs)
B.m <- ExtractMeans(summ.HO2LV, name="B", d=consts$nLVs)
rownames(B.m) <- colnames(X)
LVs <- X%*%B.m+eps.m
# Now the species loadings
vareps.m <- ExtractMeans(summ.HO2LV, name="varepsilon", d=consts$nLVs)
O.m <- ExtractMeans(summ.HO2LV, name="O", d=consts$nLVs)
rownames(O.m) <- colnames(TR)
gamma <- TR%*%O.m+vareps.m
par(mfrow=c(1,2))
#LVs
plot(LVs,type="n", xlab="Latent variable 1", ylab="Latent variable 2", main = "Sites")
text(LVs,labels=1:consts$NSites)
AddArrows(marg = par("usr"), coords=B.m, col="red")
#gammas
plot(gamma,type="n", xlab="Latent variable 1", ylab="Latent variable 2", main = "Species")
text(gamma,labels=vegan::make.cepnames(colnames(Y)))
AddArrows(marg = par("usr"), coords=O.m, col="blue")
## Warning in arrows(x0 = origin[1], y0 = origin[2], x1 = ends[, 1] + origin[1], :
## zero-length arrow is of indeterminate angle and so skipped
We can also look at some summary statistics for the scale parameters of the latent variables, species-specific residuals and site-specific residuals, respectively:
SDs <- summ.HO2LV$statistics[grep("sd", rownames(summ.HO2LV$statistics)),c("Mean", "SD")]
knitr::kable(SDs, digits=2, format = "html", table.attr = "style='width:30%;'")
| Mean | SD | |
|---|---|---|
| LVsd[1] | 0.73 | 0.13 |
| LVsd[2] | 0.73 | 0.13 |
| Sitesd[1] | 0.45 | 0.44 |
| Sitesd[2] | 0.48 | 0.48 |
| Speciesd[1] | 0.04 | 0.05 |
| Speciesd[2] | 0.07 | 0.09 |
These tell us how well the predictors explain the ordination; the species- and site-specific scale parameters would be zero if the predictors fully explained the ordination. The scale parameters for the latent variables are similar to singular values (the square root of eigenvalues) in a classical ordination; they reflect a dimension its importance to the response.
The covariance of species is determined by traits, LV-specific variation and sites’ residual variation, and the covariance between sites is determined by the environment, LV-specific variation and species’ residual variation.
The residual covariance matrix of the model (where you calculate species associations from) is determined by three terms:
where \(\mathcal{K}_0\) denotes the zero order modified Bessel function of the second kind. The first term tells us that correlations between species are determined by the environment. The second term tells us that the correlation between sites is determined by species traits. The last term induces no correlation between sites and species for diagonal \(\boldsymbol{\Sigma}\), but serves to scale species associations. Consequently, the covariance for species \(j,j2\) and site \(i,i2\) is: \[\begin{multline} \Sigma_{i,i2,j,j2} = \text{cov}(\boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{x}_{i2}^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{tr}_{j2}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i2}) + \text{cov}(\boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{tr}_j^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_i,\boldsymbol{x}_{i2}^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i},\boldsymbol{tr}_{j2}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i2}) + \\ \text{cov}(\boldsymbol{tr}_j^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_i,\boldsymbol{\epsilon}_{i2}^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{\epsilon}_{i}^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j},\boldsymbol{x}_{i2}^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{\epsilon}_{i}^\top \boldsymbol{\Sigma} \boldsymbol{\varepsilon}_{j},\boldsymbol{tr}_{j2}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i2})+ \text{cov}(\boldsymbol{\epsilon}_i^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}), \end{multline}\] third order terms are zero for central normal random variables, so this simplifies to: \[\begin{equation} \Sigma_{i,i2,j,j2} = \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2}) \boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i2}+ \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2} + \boldsymbol{tr}_j^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\epsilon}_i,\boldsymbol{\varepsilon}_{j2})\boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i2} + \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\epsilon}_{i},\boldsymbol{\epsilon}_{i2})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2} + \text{cov}(\boldsymbol{\epsilon}_i^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top \boldsymbol{\Sigma} \boldsymbol{\varepsilon}_{j2}). \end{equation}\] Here, \(\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}) = \text{cov}(\boldsymbol{\epsilon}_i,\boldsymbol{\varepsilon}_{j2}) = 0\), and for \(\text{cov}(\boldsymbol{\epsilon}_i^\top\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) = 2\text{tr}(\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma})\). Further, \(\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2})\) is zero for \(j \neq j2\) and \(\text{diag}(\boldsymbol{\delta}^2)\) otherwise, and similar for \(\text{cov}(\boldsymbol{\epsilon}_i,\boldsymbol{\epsilon}_{i2})\) .
Consequently, for a species association matrix where we consider the block of the covariance matrix where \(i = i2\), or for the site-to-site matrix where we consider the block \(j = j2\), we have:
\[\begin{split} \Sigma_{j,j2} &= \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2}) \boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i} + \text{cov}(\boldsymbol{\epsilon}_i^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i}^\top \boldsymbol{\Sigma} \boldsymbol{\varepsilon}_{j2}) + \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{var}(\boldsymbol{\epsilon}_{i},\boldsymbol{\epsilon}_{i})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2} \\ &= \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2}) \boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i} + \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2}+ 2\text{tr}\{\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\}, \end{split}\]and
\[\begin{equation} \Sigma_{i,i2} = \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\epsilon}_{i},\boldsymbol{\epsilon}_{i2})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j} + \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i2} + 2\text{tr}\{\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\}. \end{equation}\]We can visualize those matrices here for (e.g.,) the first site and first species, respectively.
#species
spec.mat <- matrix(0,nrow=NSpecies,ncol=NSpecies)
Sigma <- diag(SDs[grep("LV", rownames(SDs)),1]^2)
Sigma.sp <- diag(SDs[grep("Specie", rownames(SDs)),1]^2)
Sigma.si <- diag(SDs[grep("Site", rownames(SDs)),1]^2)
for(j in 1:NSpecies){
for(j2 in 1:NSpecies){
if(j==j2){
spec.mat[j,j2] = X[1,,drop=F]%*%B.m%*%Sigma%*%Sigma.sp%*%t(B.m)%*%t(X[1,,drop=F])
}
spec.mat[j,j2] = spec.mat[j,j2] + TR[j,,drop=F]%*%O.m%*%Sigma%*%Sigma.si%*%Sigma%*%t(O.m)%*%t(TR[j2,,drop=F]) + 2*sum(diag((Sigma.sp%*%Sigma%*%Sigma.si%*%Sigma)))
}
}
spec.cor.mat <- cov2cor(spec.mat)
colnames(spec.cor.mat) <- row.names(spec.cor.mat) <- colnames(Y)
corrplot::corrplot(spec.cor.mat,type = "lower",order = "AOE", main = "Species", mar = c(1,1,1,1),tl.srt=45,tl.cex = .5)
#sites
site.mat <- matrix(0,nrow=NSites,ncol=NSites)
for(i in 1:NSites){
for(i2 in 1:NSites){
if(i==i2){
site.mat[i,i2] = TR[1,,drop=F]%*%O.m%*%Sigma%*%Sigma.si%*%t(O.m)%*%t(TR[1,,drop=F])
}
site.mat[i,i2] = site.mat[i,i2] + X[i,,drop=F]%*%B.m%*%Sigma%*%Sigma.sp%*%Sigma%*%t(B.m)%*%t(X[i2,,drop=F]) + 2*sum(diag((Sigma.sp%*%Sigma%*%Sigma.si%*%Sigma)))
}
}
site.cor.mat <- cov2cor(site.mat)
colnames(site.cor.mat) <- row.names(site.cor.mat) <- paste("Site", 1:NSites)
corrplot::corrplot(site.cor.mat,type = "lower",order = "AOE", main = "Sites", mar = c(3,3,3,3),tl.srt=45,tl.cex = .5)